Menu Top
Non-Rationalised Geography NCERT Notes, Solutions and Extra Q & A (Class 6th to 12th)
6th 7th 8th 9th 10th 11th 12th

Class 12th Chapters
Fundamentals of Human Geography
1. Human Geography Nature And Scope 2. The World Population Distribution, Density And Growth 3. Population Composition
4. Human Development 5. Primary Activities 6. Secondary Activities
7. Tertiary And Quaternary Activities 8. Transport And Communication 9. International Trade
10. Human Settlements
India - People and Economy
1. Population : Distribution, Density, Growth And Composition 2. Migration : Types, Causes And Consequences 3. Human Development
4. Human Settlements 5. Land Resources And Agriculture 6. Water Resources
7. Mineral And Energy Resources 8. Manufacturing Industries 9. Planning And Sustainable Development In Indian Context
10. Transport And Communication 11. International Trade 12. Geographical Perspective On Selected Issues And Problems
Practical Work in Geography
1. Data – Its Source And Compilation 2. Data Processing 3. Graphical Representation Of Data
4. Use Of Computer In Data Processing And Mapping 5. Field Surveys 6. Spatial Information Technology



Chapter 2 Data Processing



Organizing and presenting raw data is the initial step in making it comprehensible and ready for analysis. Various statistical techniques are then used to extract meaningful insights from the data. This chapter introduces key statistical techniques for data analysis in geography.

These techniques are broadly categorized into three types:

1. Measures of Central Tendency

2. Measures of Dispersion

3. Measures of Relationship


Measures of Central Tendency provide a single value that represents the typical or central value of a dataset.

Measures of Dispersion describe how spread out or varied the data points are, often in relation to the central value.

Measures of Relationship (like correlation) quantify the degree of association or interdependence between two or more variables.

Measures Of Central Tendency

Geographical characteristics such as rainfall amounts, elevation, population density, educational attainment levels, or age groups all show variation. To understand these variations collectively, we often seek a single representative value that best summarizes the entire set of observations.

This representative value usually lies near the centre of the data distribution. Statistical methods used to find this central point are called measures of central tendency, also known as statistical averages.

The most common measures of central tendency are the Mean, Median, and Mode. Each provides a different way of identifying a central representative value and is suited to different types of data.

Mean

The mean is the arithmetic average of a dataset. It is calculated by summing all the values in the dataset and dividing the sum by the total number of observations. The method of calculating the mean differs slightly for ungrouped and grouped data, and can be done using either direct or indirect methods.


Computing Mean from Ungrouped Data:


Computing Mean from Grouped Data:

Median

The median is a positional average. It is the value that divides a dataset, when arranged in ascending or descending order, into two equal halves. It is not affected by the actual values of extreme observations, only by their position.


Computing Median for Ungrouped Data:

Arrange the data in ascending or descending order. The median is the value of the middle observation. The position of the median is found using the formula:

$ \text{Position of Median} = \left(\frac{N+1}{2}\right)^{\text{th}} \text{ item} $

Where N is the number of observations.

If N is odd, the median is the value at this position. If N is even, the median is the average of the values at the two middle positions (N/2 and (N/2)+1).

Example 2.3: Calculate median height for the given mountain peaks: 8,126 m, 8,611m, 7,817 m, 8,172 m, 8,076 m, 8,848 m, 8,598 m.

Answer:

Arrange in ascending order: 7,817; 8,076; 8,126; 8,172; 8,598; 8,611; 8,848.

N = 7.

Position of Median = $(7+1)/2 = 4^{\text{th}}$ item.

The 4th item in the arranged series is 8,172 m.

$ \text{Median} (M) = 8,172 \text{ m} $


Computing Median for Grouped Data:

For grouped data, the median is calculated using the cumulative frequency distribution to find the class where the median lies (the median class). The formula is:

$ M = l + \frac{\frac{N}{2} - c}{f} \times i $

Where:

Example 2.4: Calculate the median for the following frequency distribution:

Class f
50-60 3
60-70 7
70-80 11
80-90 16
90-100 8
100-110 5

Answer:

Calculate cumulative frequencies (F) and find the median position (N/2).

Class Frequency (f) Cumulative Frequency (F) Calculation of Median Class
50-60 3 3
60-70 7 10
70-80 11 21 (c)
80-90 16 (f) 37 Median group (N/2 = 25 is here)
90-100 8 45
100-110 5 50
Total N = $\sum f = 50$

$ N/2 = 50/2 = 25 $. The cumulative frequency next greater than 25 is 37, which falls in the 80-90 class. So, the median class is 80-90.

l = 80, N = 50, c = 21 (cumulative frequency of the class before 80-90), f = 16 (frequency of 80-90 class), i = 10 (class interval width).

$ M = 80 + \frac{25 - 21}{16} \times 10 = 80 + \frac{4}{16} \times 10 = 80 + \frac{1}{4} \times 10 = 80 + 2.5 = 82.5 $

Mode

The mode is the value that appears most frequently in a dataset. It is represented by Z or M0. The mode is generally less used than the mean or median.


Computing Mode for Ungrouped Data:

For ungrouped data, arrange the measures in ascending or descending order and simply count the frequency of each value to identify the one that occurs most often.

Example 2.5: Calculate mode for test scores: 61, 10, 88, 37, 61, 72, 55, 61, 46, 22.

Answer:

Arrange in ascending order: 10, 22, 37, 46, 55, 61, 61, 61, 72, 88.

The score 61 occurs 3 times, more than any other score.

$ \text{Mode} (Z) = 61 $ (Unimodal - one mode)

Example 2.6: Calculate mode for test scores: 82, 11, 57, 82, 08, 11, 82, 95, 41, 11.

Answer:

Arrange in ascending order: 08, 11, 11, 11, 41, 57, 82, 82, 82, 95.

The scores 11 and 82 both occur 3 times, which is the highest frequency.

$ \text{Mode} (Z) = 11 \text{ and } 82 $ (Bimodal - two modes)

If three values have the same highest frequency, the distribution is trimodal. If many values have the same highest frequency, it's multimodal. If no value is repeated, there is no mode.

Comparison Of Mean, Median And Mode

The relationship between mean, median, and mode can be visualized using a frequency distribution curve.

In a normal distribution, the frequency distribution is symmetrical and bell-shaped. In a perfect normal distribution, the mean, median, and mode all coincide and are located at the peak of the curve, representing the central value with the highest frequency.

Normal Distribution Curve with Mean, Median, Mode at the center

However, if the data distribution is not symmetrical but skewed (pushed towards one end), the mean, median, and mode will not coincide.

The choice of which measure of central tendency to use depends on the data type and distribution. The mean is sensitive to extreme values. The median is less affected by extreme values and is suitable for skewed distributions. The mode is useful for categorical data or identifying the most common value, but can be unstable and may not exist or be unique.



Measures Of Dispersion

Measures of central tendency alone do not fully describe a dataset. They tell us the centre but not how the data points are spread out around that centre. Dispersion (or variability) refers to the scattering or spread of scores or measurements within a distribution.

Using measures of dispersion alongside central tendency provides a better understanding of the distribution's characteristics, such as its homogeneity or variability.

Dispersion serves two main purposes: understanding the composition of a distribution and comparing the stability or homogeneity of different distributions.


Common methods for measuring dispersion are:

The Range, Standard Deviation (as an absolute measure), and Coefficient of Variation (as a relative measure) are widely used. Quartile Deviation and Mean Deviation are less common.

Range

The range (R) is the simplest measure of dispersion, calculated as the difference between the highest (L) and lowest (S) values in a dataset.

$ R = L - S $

Example 2.7: Calculate the range for daily wages: Rs. 40, 42, 45, 48, 50, 52, 55, 58, 60, 100.

Answer:

Highest value (L) = 100, Lowest value (S) = 40.

$ R = 100 - 40 = 60 $

The range is highly influenced by extreme values and is considered an unstable measure of dispersion, similar to how the mode is an unstable measure of central tendency.

Standard Deviation

The standard deviation (SD) is the most common and stable measure of dispersion. It is calculated around the mean and represents the typical distance of data points from the mean. It is defined as the square root of the variance.

The Greek letter $\sigma$ (sigma) often denotes Standard Deviation for a population, while 's' or SD is used for a sample.

The formula for Standard Deviation for ungrouped data is:

$ s = \sqrt{\frac{\sum x^2}{N}} $

Where $x$ is the deviation of each score from the mean ($x = X - \bar{X}$) and $x^2$ is the squared deviation.

The term $\frac{\sum x^2}{N}$ before taking the square root is called the variance ($s^2$). Standard deviation is the square root of variance, and variance is the square of standard deviation.


Computing Standard Deviation for Ungrouped Data:

Example 2.8: Calculate the standard deviation for scores: 01, 03, 05, 07, 09.

Answer:

First, calculate the mean ($\bar{X}$).

$ \bar{X} = (1+3+5+7+9)/5 = 25/5 = 5 $

Calculate deviations from the mean (x) and squared deviations (x$^2$).

X (Score) $x = X - \bar{X}$ (Deviation from Mean) $x^2$ (Squared Deviation)
1 $1 - 5 = -4$ $(-4)^2 = 16$
3 $3 - 5 = -2$ $(-2)^2 = 4$
5 $5 - 5 = 0$ $(0)^2 = 0$
7 $7 - 5 = 2$ $(2)^2 = 4$
9 $9 - 5 = 4$ $(4)^2 = 16$
$\sum X = 25$ $\sum x = 0$ (Check: sum of deviations is zero) $\sum x^2 = 40$

N = 5.

$ s = \sqrt{\frac{\sum x^2}{N}} = \sqrt{\frac{40}{5}} = \sqrt{8} \approx 2.83 $


Computing Standard Deviation for Grouped Data:

For grouped data, a simplified calculation method similar to the indirect method for mean is often used.

$ s = i \times \sqrt{\frac{\sum fu^2}{N} - \left(\frac{\sum fu}{N}\right)^2} $

Where:

Example: Calculate the standard deviation for the following distribution:

Groups f
120-130 2
130-140 4
140-150 6
150-160 12
160-170 10
170-180 6

Answer:

Calculate midpoints, choose an assumed mean, calculate simplified deviations (u), fu, and fu$^2$. Assumed mean (A) = 155 (midpoint of 150-160), interval (i) = 10.

Group f Midpoint (x) $u = (x - 155) / 10$ fu $u^2$ $fu^2$
120 - 130 2 125 -3 -6 9 18
130 - 140 4 135 -2 -8 4 16
140 - 150 6 145 -1 -6 1 6
150 - 160 12 155 0 0 0 0
160 - 170 10 165 1 10 1 10
170 - 180 6 175 2 12 4 24
Total N = 40 $\sum fu = 2$ $\sum fu^2 = 74$

$ s = 10 \times \sqrt{\frac{74}{40} - \left(\frac{2}{40}\right)^2} = 10 \times \sqrt{1.85 - (0.05)^2} = 10 \times \sqrt{1.85 - 0.0025} = 10 \times \sqrt{1.8475} \approx 10 \times 1.359 = 13.59 $

Coefficient Of Variation (CV)

The Coefficient of Variation (CV) is a relative measure of dispersion. It is particularly useful for comparing the variability of datasets that are expressed in different units of measurement or have vastly different means. CV expresses the standard deviation as a percentage of the mean.

$ \text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100 $

$ \text{CV} = \frac{s}{\bar{X}} \times 100 $

Example: Calculate the CV for the dataset in Example 2.8 ($\bar{X} = 5$, $s \approx 2.83$).

Answer:

$ \text{CV} = \frac{2.83}{5} \times 100 = 0.566 \times 100 = 56.6\% $

A higher CV indicates greater relative variability or dispersion compared to the mean.



Measures Of Relationship

Measures of relationship explore the association or interdependence between two or more variables. When changes in one variable are associated with changes in another, we say they are related or correlated. Correlation is a measure of this relationship.

Correlation describes both the nature (direction) and strength (degree) of the relationship between variables.

Direction Of Correlation

The direction of correlation indicates whether variables change together in the same direction or opposite directions.

A scatter plot visually shows the relationship between two variables. In a scatter plot, if points tend to rise from lower left to upper right, it indicates positive correlation. If points tend to fall from upper left to lower right, it indicates negative correlation. If points are scattered randomly with no clear pattern, it indicates no correlation.

Scatter plot showing a perfect positive linear relationship
Scatter plot showing a perfect negative linear relationship
Scatter plot showing no clear linear relationship between variables

Degree Of Correlation

The degree or strength of correlation measures how closely the two variables are related. It is expressed numerically, typically ranging from -1 to +1.

The correlation coefficient falls within the range of -1.00 to +1.00. It can never exceed 1 in either direction.

Correlations between 0 and $\pm 1$ indicate varying degrees of relationship:

Diagram showing the range of correlation coefficients from -1 to +1

Spearman’s Rank Correlation

Spearman's Rank Correlation, denoted by $r_s$ or $\rho$ (rho), is a non-parametric method used to measure the degree of association between two variables based on their ranks rather than their raw values. It is particularly useful when data is ordinal or when the number of observations is small.

The formula for Spearman's Rank Correlation is:

$ r_s = 1 - \frac{6 \sum D^2}{N(N^2 - 1)} $

Where:


Steps for Calculation:

Example 2.9: Calculate Spearman’s Rank Correlation for the given scores in Economics (X) and Geography (Y).

Economics (X) Geography (Y)
02 04
08 12
00 06
20 24
12 16
16 18
06 08
18 20
09 09
10 10

Answer:

Follow these steps to compute the rank correlation:

X (Score) Y (Score) XR (Rank of X) YR (Rank of Y) D (Difference in Ranks $|XR - YR|$) D$^2$
2 4 9 10 $|9 - 10| = 1$ 1
8 12 7 5 $|7 - 5| = 2$ 4
0 6 10 9 $|10 - 9| = 1$ 1
20 24 1 1 $|1 - 1| = 0$ 0
12 16 4 4 $|4 - 4| = 0$ 0
16 18 3 3 $|3 - 3| = 0$ 0
6 8 8 8 $|8 - 8| = 0$ 0
18 20 2 2 $|2 - 2| = 0$ 0
9 9 6 7 $|6 - 7| = 1$ 1
10 10 5 6 $|5 - 6| = 1$ 1
N = 10 $\sum D^2 = 8$

Apply the formula:

$ r_s = 1 - \frac{6 \sum D^2}{N(N^2 - 1)} = 1 - \frac{6 \times 8}{10(10^2 - 1)} = 1 - \frac{48}{10(100 - 1)} = 1 - \frac{48}{10(99)} = 1 - \frac{48}{990} $

$ r_s = 1 - 0.04848... \approx 1 - 0.05 = 0.95 $

The rank correlation coefficient is approximately 0.95, indicating a very strong positive correlation between the scores in Economics and Geography for this group of students.

Rank correlation is a good alternative when the number of cases is small. For larger datasets, calculating ranks can become cumbersome, and other correlation methods might be more efficient.



Excercises

This section contains exercises covering the calculation and interpretation of measures of central tendency, dispersion, and correlation, allowing students to practice and apply the statistical techniques learned in the chapter.